Deep Web Classification based on Domain Feature Text
نویسندگان
چکیده
Deep web provides tremendous structured data with high quality. In order to retrieve deep web data, one important task is to classify the domains of deep web automatically. In this paper, an approach based on domain feature text (DFT) is presented to classify the deep web. In the phase of DFT selection, a semantic abstract method based on ontology knowledge and a quantitative criteria for DFT selection based on domain correlation is proposed, which enhances the representational ability of DFT and avoids the subjectivity and uncertainty of manual selection as well. In the process of the interface vector construction, an improved weighting method is given to evaluate the different roles of DFT. Finally, a KNN algorithm is used to classify these interface vectors. Experiments on 160 query interfaces in 4 typical domains demonstrate the feasibility and effectiveness of our proposed approach.
منابع مشابه
A novel method based on a combination of deep learning algorithm and fuzzy intelligent functions in order to classification of power quality disturbances in power systems
Automatic classification of power quality disturbances is the foundation to deal with power quality problem. From the traditional point of view, the identification process of power quality disturbances should be divided into three independent stages: signal analysis, feature selection and classification. However, there are some inherent defects in signal analysis and the procedure of manual fe...
متن کاملEfficient Method Based on Combination of Deep Learning Models for Sentiment Analysis of Text
People's opinions about a specific concept are considered as one of the most important textual data that are available on the web. However, finding and monitoring web pages containing these comments and extracting valuable information from them is very difficult. In this regard, developing automatic sentiment analysis systems that can extract opinions and express their intellectual process has ...
متن کاملA Novel One Sided Feature Selection Method for Imbalanced Text Classification
The imbalance data can be seen in various areas such as text classification, credit card fraud detection, risk management, web page classification, image classification, medical diagnosis/monitoring, and biological data analysis. The classification algorithms have more tendencies to the large class and might even deal with the minority class data as the outlier data. The text data is one of t...
متن کاملImproving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA
With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...
متن کاملA Novel Approach to Feature Selection Using PageRank algorithm for Web Page Classification
In this paper, a novel filter-based approach is proposed using the PageRank algorithm to select the optimal subset of features as well as to compute their weights for web page classification. To evaluate the proposed approach multiple experiments are performed using accuracy score as the main criterion on four different datasets, namely WebKB, Reuters-R8, Reuters-R52, and 20NewsGroups. By analy...
متن کامل